Analogy Training Multilingual Encoders

نویسندگان

چکیده

Language encoders encode words and phrases in ways that capture their local semantic relatedness, but are known to be globally inconsistent. Global inconsistency can seemingly corrected for, part, by leveraging signals from knowledge bases, previous results partial limited monolingual English encoders. We extract a large-scale multilingual, multi-word analogy dataset Wikidata for diagnosing correcting global inconsistencies, then implement four-way Siamese BERT architecture grounding multilingual (mBERT) through training. show training not only improves the consistency of mBERT, as well isomorphism language-specific subspaces, also leads consistent gains on downstream tasks such bilingual dictionary induction sentence retrieval.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gradual training of deep denoising auto encoders

Stacked denoising auto encoders (DAEs) are well known to learn useful deep representations, which can be used to improve supervised training by initializing a deep network. We investigate a training scheme of a deep DAE, where DAE layers are gradually added and keep adapting as additional layers are added. We show that in the regime of mid-sized datasets, this gradual training provides a small ...

متن کامل

Gradual Training Method for Denoising Auto Encoders

متن کامل

Training Auto-encoders Effectively via Eliminating Task-irrelevant Input Variables

Auto-encoders are often used as building blocks of deep network classifier to learn feature extractors, but task-irrelevant information in the input data may lead to bad extractors and result in poor generalization performance of the network. In this paper,via dropping the task-irrelevant input variables the performance of auto-encoders can be obviously improved .Specifically, an importance-bas...

متن کامل

Using different cost functions when pre-training stacked auto-encoders

متن کامل

Is Joint Training Better for Deep Auto-Encoders?

Traditionally, when generative models of data are developed via deep architectures, greedy layer-wise pre-training is employed. In a well-trained model, the lower layer of the architecture models the data distribution conditional upon the hidden variables, while the higher layers model the hidden distribution prior. But due to the greedy scheme of the layerwise training technique, the parameter...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i14.17524